当任何延迟较大时,异步随机梯度下降(SGD)的现有分析显着降低,给人的印象是性能主要取决于延迟。相反,无论梯度中的延迟如何,我们都证明,我们可以更好地保证相同的异步SGD算法,而不是仅取决于用于实现算法的平行设备的数量。我们的保证严格比现有分析要好,我们还认为,异步SGD在我们考虑的设置中优于同步Minibatch SGD。为了进行分析,我们介绍了基于“虚拟迭代”和延迟自适应步骤的新颖递归,这使我们能够为凸面和非凸面目标得出最先进的保证。
translated by 谷歌翻译
分散的优化在机器学习方面越来越受欢迎,其可伸缩性和效率。直观地,它也应提供更好的隐私保证,因为节点只能观察到网络图中其邻居发送的消息。但是,正式化和量化这一收益是具有挑战性的:现有结果通常仅限于当地差异隐私(LDP)保证忽略权力下放的优势。在这项工作中,我们介绍了成对网络差异隐私,这是一种放松的LDP,该隐藏率捕获了一个事实,即从节点$ u $到节点$ v $的隐私泄漏可能取决于它们在图中的相对位置。然后,我们分析局部噪声注入与固定和随机通信图上的(简单或随机)八卦方案的组合。我们还得出了一种差异化的分散优化算法,该算法在局部梯度下降步骤和八卦平均之间进行交替。我们的结果表明,我们的算法放大隐私保证是图表中节点之间距离的函数,与受信任策展人的隐私性权衡取舍相匹配,直到明确取决于图形拓扑的因素。最后,我们通过有关合成和现实世界数据集的实验来说明我们的隐私收益。
translated by 谷歌翻译
在分散的优化中,通信网络的节点每个都具有局部目标函数,并使用基于八卦的方法进行通信,以最大程度地减少这些每节点函数的平均值。尽管同步算法受到图表中的一些慢节点或边缘的影响(\ emph {straggler问题}),但众所周知,它们的异步对应物很难参数化。实际上,到目前为止,它们针对具有异质通信和计算延迟的网络的收敛属性已经违反了分析。在本文中,我们使用\ emph {Continuized}框架来分析具有延迟的网络中的异步算法。我们的方法对收敛时间及其对网络中异质延迟的依赖性的精确表征。我们的连续框架受益于连续和离散世界中最好的:它适用的算法基于事件驱动的更新。因此,它们本质上是离散的,因此很容易实现。然而,他们的分析本质上是连续的,部分依赖于延迟的ODE理论。此外,我们的算法实现了\ emph {异步加速}:它们的收敛速率受到局部延迟加权的网络图的特征控制,而不是以前的分析中的网络范围最差的延迟。因此,我们的方法享有改善对散乱者的鲁棒性。
translated by 谷歌翻译
For long-term simultaneous planning, localization and mapping (SPLAM), a robot should be able to continuously update its map according to the dynamic changes of the environment and the new areas explored. With limited onboard computation capabilities, a robot should also be able to limit the size of the map used for online localization and mapping. This paper addresses these challenges using a memory management mechanism, which identifies locations that should remain in a Working Memory (WM) for online processing from locations that should be transferred to a Long-Term Memory (LTM). When revisiting previously mapped areas that are in LTM, the mechanism can retrieve these locations and place them back in WM for online SPLAM. The approach is tested on a robot equipped with a short-range laser rangefinder and a RGB-D camera, patrolling autonomously 10.5 km in an indoor environment over 11 sessions while having encountered 139 people.
translated by 谷歌翻译
Vision transformers have emerged as powerful tools for many computer vision tasks. It has been shown that their features and class tokens can be used for salient object segmentation. However, the properties of segmentation transformers remain largely unstudied. In this work we conduct an in-depth study of the spatial attentions of different backbone layers of semantic segmentation transformers and uncover interesting properties. The spatial attentions of a patch intersecting with an object tend to concentrate within the object, whereas the attentions of larger, more uniform image areas rather follow a diffusive behavior. In other words, vision transformers trained to segment a fixed set of object classes generalize to objects well beyond this set. We exploit this by extracting heatmaps that can be used to segment unknown objects within diverse backgrounds, such as obstacles in traffic scenes. Our method is training-free and its computational overhead negligible. We use off-the-shelf transformers trained for street-scene segmentation to process other scene types.
translated by 谷歌翻译
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar, without ground-truth input-translation pairs. Existing UEI2I methods represent style using either a global, image-level feature vector, or one vector per object instance/class but requiring knowledge of the scene semantics. Here, by contrast, we propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information. We then rely on perceptual and adversarial losses to disentangle our dense style and content representations, and exploit unsupervised cross-domain semantic correspondences to warp the exemplar style to the source content. We demonstrate the effectiveness of our method on two datasets using standard metrics together with a new localized style metric measuring style similarity in a class-wise manner. Our results evidence that the translations produced by our approach are more diverse and closer to the exemplars than those of the state-of-the-art methods while nonetheless preserving the source content.
translated by 谷歌翻译
The optimal layout of a complex system such as aerospace vehicles consists in placing a given number of components in a container in order to minimize one or several objectives under some geometrical or functional constraints. This paper presents an extended formulation of this problem as a variable-size design space (VSDS) problem to take into account a large number of architectural choices and components allocation during the design process. As a representative example of such systems, considering the layout of a satellite module, the VSDS aspect translates the fact that the optimizer has to choose between several subdivisions of the components. For instance, one large tank of fuel might be placed as well as two smaller tanks or three even smaller tanks for the same amount of fuel. In order to tackle this NP-hard problem, a genetic algorithm enhanced by an adapted hidden-variables mechanism is proposed. This latter is illustrated on a toy case and an aerospace application case representative to real world complexity to illustrate the performance of the proposed algorithms. The results obtained using the proposed mechanism are reported and analyzed.
translated by 谷歌翻译
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and optimisation software tools. Despite the practicality of this technique, the performance of the differentiated programs, especially for functional languages and in the presence of vectors, is suboptimal. We present an AD system for a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source forward-mode AD and global optimisations such as loop transformations. In combination, gradient computation with forward-mode AD can be as efficient as reverse mode, and the Jacobian matrices required for numerical algorithms such as Gauss-Newton and Levenberg-Marquardt can be efficiently computed.
translated by 谷歌翻译
With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models whereas only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the pre-trained unsupervised PEGASUS by 4.37% to 7.27% relative mean ROUGE across four widely-adopted summarization benchmarks, and achieves relative gains of 7.51% (up to 23.73%) averaged over 30 transfer setups.
translated by 谷歌翻译
While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model can give much more information than we expect, and before using external models and measures, we first need to ask: how far can we go if we use nothing but the translation model itself ? We propose to use a method that evaluates the percentage of the source contribution to a generated translation. Intuitively, hallucinations are translations "detached" from the source, hence they can be identified by low source contribution. This method improves detection accuracy for the most severe hallucinations by a factor of 2 and is able to alleviate hallucinations at test time on par with the previous best approach that relies on external models. Next, if we move away from internal model characteristics and allow external tools, we show that using sentence similarity from cross-lingual embeddings further improves these results.
translated by 谷歌翻译